14 research outputs found

    Ranking, Labeling, and Summarizing Short Text in Social Media

    Get PDF
    One of the key features driving the growth and success of the Social Web is large-scale participation through user-contributed content – often through short text in social media. Unlike traditional long-form documents – e.g., Web pages, blog posts – these short text resources are typically quite brief (on the order of 100s of characters), often of a personal nature (reflecting opinions and reactions of users), and being generated at an explosive rate. Coupled with this explosion of short text in social media is the need for new methods to organize, monitor, and distill relevant information from these large-scale social systems, even in the face of the inherent “messiness” of short text, considering the wide variability in quality, style, and substance of short text generated by a legion of Social Web participants. Hence, this dissertation seeks to develop new algorithms and methods to ensure the continued growth of the Social Web by enhancing how users engage with short text in social media. Concretely, this dissertation takes a three-fold approach: First, this dissertation develops a learning-based algorithm to automatically rank short text comments associated with a Social Web object (e.g., Web document, image, video) based on the expressed preferences of the community itself, so that low-quality short text may be filtered and user attention may be focused on highly-ranked short text. Second, this dissertation organizes short text through labeling, via a graph- based framework for automatically assigning relevant labels to short text. In this way meaningful semantic descriptors may be assigned to short text for improved classification, browsing, and visualization. Third, this dissertation presents a cluster-based summarization approach for extracting high-quality viewpoints expressed in a collection of short text, while maintaining diverse viewpoints. By summarizing short text, user attention may quickly assess the aggregate viewpoints expressed in a collection of short text, without the need to scan each of possibly thousands of short text items

    Alexandria: Extensible Framework for Rapid Exploration of Social Media

    Full text link
    The Alexandria system under development at IBM Research provides an extensible framework and platform for supporting a variety of big-data analytics and visualizations. The system is currently focused on enabling rapid exploration of text-based social media data. The system provides tools to help with constructing "domain models" (i.e., families of keywords and extractors to enable focus on tweets and other social media documents relevant to a project), to rapidly extract and segment the relevant social media and its authors, to apply further analytics (such as finding trends and anomalous terms), and visualizing the results. The system architecture is centered around a variety of REST-based service APIs to enable flexible orchestration of the system capabilities; these are especially useful to support knowledge-worker driven iterative exploration of social phenomena. The architecture also enables rapid integration of Alexandria capabilities with other social media analytics system, as has been demonstrated through an integration with IBM Research's SystemG. This paper describes a prototypical usage scenario for Alexandria, along with the architecture and key underlying analytics.Comment: 8 page

    Mapping 123 million neonatal, infant and child deaths between 2000 and 2017

    Get PDF
    Since 2000, many countries have achieved considerable success in improving child survival, but localized progress remains unclear. To inform efforts towards United Nations Sustainable Development Goal 3.2—to end preventable child deaths by 2030—we need consistently estimated data at the subnational level regarding child mortality rates and trends. Here we quantified, for the period 2000–2017, the subnational variation in mortality rates and number of deaths of neonates, infants and children under 5 years of age within 99 low- and middle-income countries using a geostatistical survival model. We estimated that 32% of children under 5 in these countries lived in districts that had attained rates of 25 or fewer child deaths per 1,000 live births by 2017, and that 58% of child deaths between 2000 and 2017 in these countries could have been averted in the absence of geographical inequality. This study enables the identification of high-mortality clusters, patterns of progress and geographical inequalities to inform appropriate investments and implementations that will help to improve the health of all populations

    Summarizing User-Contributed Comments

    No full text
    User-contributed comments are one of the hallmarks of the Social Web, widely adopted across social media sites and mainstream news providers alike. While comments encourage higher-levels of user engagement with online media, their wide success places new burdens on users to process and assimilate the perspectives of a huge number of user-contributed perspectives. Toward overcoming this problem we study in this paper the comment summarization problem: for a set of n user-contributed comments associated with an online resource, select the best top-k comments for summarization. In this paper we propose (i) a clustering-based approach for identifying correlated groups of comments; and (ii) a precedence-based ranking framework for automatically selecting informative user-contributed comments. We find that in combination, these two salient features yield promising results

    Probabilistic generative models of the social annotation process

    No full text
    Abstract—With the growth in the past few years of social tagging services like Delicious and CiteULike, there is growing interest in modeling and mining these social systems for deriving implicit social collective intelligence. In this paper, we propose and explore two probabilistic generative models of the social annotation (or tagging) process with an emphasis on user participation. These models leverage the inherent social communities implicit in these tagging services. We compare the proposed models to two prominent probabilistic topic models (Latent Dirichlet Allocation and Pachinko Allocation) via an experimental study of the popular Delicious tagging service. We find that the proposed community-based annotation models identify more coherent implicit structures than the alternatives and are better suited to handle unseen social annotation data. I
    corecore